I recently decided to pursue the Imperial Business Analytics programme by Executive Education at Imperial College Business School. I had to make a final project as part of the programme, and I decided to make my final project about a topic that pasions me: Telecommunications and internet experience. COVID-19 has created unprecedent challenges for data networks: Just as in many other countries, the Colombian government imposed a strict lockdown as a response to COVID-19 emergency. In a very short time, and without preparation, Colombian society had to adapt so that a significant part of the economy and the society can still function without leaving home. This adaptation was required an accelerated adoption of digital tools:
Many companies have been forced to adapt so that their employees can work from home. Schools have been forced to close and offer remote learning There was an unprecedent surge in streaming and internet-based entertainment services Adoption of digital and remote tools have resulted in an unprecedent increase in data traffic that was not foreseen. Networks became more critical than ever so that society could function at least to some degree.
Network operators have been forced to cope with such traffic spike without proper preparation. Capacity expansion processes were normally run in yearly cycles using previous 12-month window traffic as an input to predict near-term needs. Many people believe that operators enjoyed a boost in revenue and profits because of the increase in traffic, as experienced by Internet based companies such as Amazon or Zoom. However, operators’ public results have shown the opposite: Their revenue and cash-in has been reduced because of an increase in lack of payment, overdues, and cancellations. Financial deterioration further constrained operators’ capacities to expand their networks to match traffic increase.
My project performed an analysis in Colombia (my home country) to understand if such traffic increase has resulted in deterioration of user experience, and identify municipalities that have suffered the most so that public policy can be adapted to support investments in network infrastructure for such municipalities.
My project performed an assessment on how unexpected surge due to COVID-19 has impacted internet experience KPIs (average downlink speed, average uplink speed, latency) in all municipalities in Colombia compared to pre-COVID-19 scenario. The assessment analyzed the time series of aggregated traffic in Colombia to confirm the historical raise of traffic, assessed the impact of such surge possibly because of network saturation, classified municipalities based on how severe the impact on user experience was, and forecasted possible traffic trends after the economy re-opened. Results could be used by public policy makers to better define where incentives for network infrastructure investments should be prioritized, and which technologies should be incentivized.
Anaconda was the tool used during most of the Imperial Analytics Programme. I tried to run the necessary transformations to the dataset using Anaconda and using my local machine. After several memory crashes I realized that I would required a cloud platform. I had previously experienced with IBM Cloud when I pursued my IBM Data Science Certification: IBM Cloud is a practical, intuitive platform that is easy to use even for quickstarters. The free account version allows to setup a free account with an environment that allows up to 50 monthly capacity units per hour, a 4vCPU + 16GB RAM environment utilizes 2 capacity units per hour. However, I estimated that I would require something higher than 16GB of RAM to run my code, therefore I decided to experience with a different platform that I have been using in the my current job: I decided to try Google Cloud Platform. GCP allows any user to setup and account and start a 90 days trial and a 300 USD trial. It is very simple to setup an standard instance with 4vCPU and 32GB of RAM, which was sufficient computing power for my project.
I was amazed to know the amount of public data that was available for my project. I was surprised to see that ookla speedtest has made available to the public the dataset containing aggregated and anonimized geo-referenced data on downlink speed, ulplink speed and latency. I was also happy to see that Colombian's government initiative to consolidate a repository of public data "datos.gov.co" has advanced and now includes many datasets from the Ministry of Telecommunications. I identified that these sources of data, which are completely public and available to anyone, would be sufficient to perform my analysis as originally envisioned. I specifically included the following sources of data:
import os
import matplotlib.pyplot as plt
import geopandas as gpd
import earthpy as et
import geoplot as gpl
import shapefile as shp
import mapclassify as mpclas
import numpy as np
import pandas as pd
import seaborn as sns
from scipy import stats
from sklearn import svm
# Speed test fixed Q1 dataset:
st_fixed_q1 = gpd.read_file('gs://imperial_analytics_project/sources/1_speed_test_titles/fixed_q1_20/gps_fixed_tiles.shp')
print("Speed test fixed Q1 data loaded Shape:", st_fixed_q1.shape)
display(st_fixed_q1.head())
# Speed test fixed Q2 dataset:
st_fixed_q2 = gpd.read_file('gs://imperial_analytics_project/sources/1_speed_test_titles/fixed_q2_20/gps_fixed_tiles.shp')
print("Speed test fixed Q2 data loaded Shape:", st_fixed_q2.shape)
display(st_fixed_q2.head())
# Speed test fixed Q3 dataset:
st_fixed_q3 = gpd.read_file('gs://imperial_analytics_project/sources/1_speed_test_titles/fixed_q3_20/gps_fixed_tiles.shp')
print("Speed test fixed Q3 data loaded Shape:", st_fixed_q3.shape)
display(st_fixed_q3.head())
#DANE GDP by municipality
gdp_mun = pd.read_csv('gs://imperial_analytics_project/sources/5_dane_gdp/dane_gdp_per_mun.csv')
print("GDP by municipality data loaded Shape:", gdp_mun.shape)
display(gdp_mun.head(2))
# Speed test mobile Q1 dataset:
st_mob_q1 = gpd.read_file('gs://imperial_analytics_project/sources/1_speed_test_titles/mobile_q1_20/gps_mobile_tiles.shp')
print("Speed test mobile Q1 data loaded Shape:", st_mob_q1.shape)
display(st_mob_q1.head())
# Speed test mobile Q2 dataset:
st_mob_q2 = gpd.read_file('gs://imperial_analytics_project/sources/1_speed_test_titles/mobile_q2_20/gps_mobile_tiles.shp')
print("Speed test mobile Q2 data loaded Shape:", st_mob_q2.shape)
display(st_mob_q2.head())
# Speed test mobile Q3 dataset:
st_mob_q3 = gpd.read_file('gs://imperial_analytics_project/sources/1_speed_test_titles/mobile_q3_20/gps_mobile_tiles.shp')
print("Speed test mobile Q3 data loaded Shape:", st_mob_q3.shape)
display(st_mob_q3.head())
#Show all columns
pd.set_option('display.max_columns', None)
#Municipality SHP files
mun_shp = gpd.read_file('gs://imperial_analytics_project/sources/2_dane_shp_mun/MGN_MPIO_POLITICO.shp')
print("Municipalities geo-shape data loaded Shape:", mun_shp.shape)
display(mun_shp.head(2))
#Aggregated traffic per day per operator
agg_traf_op = pd.read_csv('gs://imperial_analytics_project/sources/3_mintic_agg_traff_day/agg_traf_apr_dec.csv')
agg_traf_base = pd.read_csv('gs://imperial_analytics_project/sources/3_mintic_agg_traff_day/agg_traf_base.csv')
print("Aggregated traffice per day per operator data loaded Shape:", agg_traf_op.shape)
display(agg_traf_op.head(2))
print("Aggregated traffice per day per operator base loaded Shape:", agg_traf_base.shape)
display(agg_traf_base.head(2))
#Fixed Access Penetration by municipality
fixed_acc_pen = pd.read_csv('gs://imperial_analytics_project/sources/4_mintic_fixed_mobile_by_mun/fixed_acc_by_mun_pen_v4.csv')
print("Fixed access penetration by municipality data loaded Shape:", fixed_acc_pen.shape)
display(fixed_acc_pen.head(2))
#Fixed Access Numer of Access Type by municipality
fixed_acc_type = pd.read_csv('gs://imperial_analytics_project/sources/4_mintic_fixed_mobile_by_mun/fixed_acc_by_mun_type_v4.csv')
print("Fixed Access Numer of Access Type by municipality data loaded Shape:", fixed_acc_type.shape)
display(fixed_acc_type.head(2))
#Mobile Access Coverage by Technology by municipality
mobile_cov_tech = pd.read_csv('gs://imperial_analytics_project/sources/4_mintic_fixed_mobile_by_mun/mobile_cov_per_tech_by_mun_v4.csv')
print("Mobile Access Coverage by Technology by municipality data loaded Shape:", mobile_cov_tech.shape)
display(mobile_cov_tech.head(2))
#DANE GDP by municipality
gdp_mun = pd.read_csv('gs://imperial_analytics_project/sources/5_dane_gdp/dane_gdp_per_mun.csv')
print("GDP by municipality data loaded Shape:", gdp_mun.shape)
display(gdp_mun.head(2))
The analysis requires all datasets to be joined in order to have a common, geo-referenced dataset that can be used to compare different municipalities and quarters. In order to do so I followed these steps to prepare the data:
As a result of the dataset transformation, two geo-located dataframes (SHP files) are generated:
I used geopandas sjoin function to assign speed test measurements to Colombian municipalities. Geopanda's sjoin is a powerful function, it could effectively join the tables and asign around 30k tiles out more than 6 million to the corresponding municipality in Colombia. The drawback from this function is that it is rather slow, as it is not the most efficient way to compute large geo-referenced datasets. If you are interested in how to optimize geo-referenced processing you can check this blog by dask.org (https://blog.dask.org/2017/09/21/accelerating-geopandas-1). It took a while, but geopandas managed to do the work and generate 6 dataframes with municipality SHP geometry (one dataframe per quarter for mobile and fixed data).
# Fixed Q1 sjoin
sjoin_fix_q1 = gpd.sjoin(mun_shp, st_fixed_q1, how='inner', op='intersects', lsuffix='left', rsuffix='right')
print("Speed test fixed Q1 sjoin with municipality SHP files data loaded Shape:", sjoin_fix_q1.shape)
display(sjoin_fix_q1.head(2))
# Fixed Q2 sjoin
sjoin_fix_q2 = gpd.sjoin(mun_shp, st_fixed_q2, how='inner', op='intersects', lsuffix='left', rsuffix='right')
print("Speed test fixed Q2 sjoin with municipality SHP files data loaded Shape:", sjoin_fix_q2.shape)
display(sjoin_fix_q2.head(2))
# Fixed Q3 sjoin
sjoin_fix_q3 = gpd.sjoin(mun_shp, st_fixed_q3, how='inner', op='intersects', lsuffix='left', rsuffix='right')
print("Speed test fixed Q3 sjoin with municipality SHP files data loaded Shape:", sjoin_fix_q3.shape)
display(sjoin_fix_q3.head(2))
# Mobile Q1 sjoin
sjoin_mob_q1 = gpd.sjoin(mun_shp, st_mob_q1, how='inner', op='intersects', lsuffix='left', rsuffix='right')
print("Speed test mobile Q1 sjoin with municipality SHP files data loaded Shape:", sjoin_mob_q1.shape)
display(sjoin_mob_q1.head(2))
# Mobile Q2 sjoin
sjoin_mob_q2 = gpd.sjoin(mun_shp, st_mob_q2, how='inner', op='intersects', lsuffix='left', rsuffix='right')
print("Speed test mobile Q2 sjoin with municipality SHP files data loaded Shape:", sjoin_mob_q2.shape)
display(sjoin_mob_q2.head(2))
# Mobile Q3 sjoin
sjoin_mob_q3 = gpd.sjoin(mun_shp, st_mob_q3, how='inner', op='intersects', lsuffix='left', rsuffix='right')
print("Speed test mobile Q3 sjoin with municipality SHP files data loaded Shape:", sjoin_mob_q3.shape)
display(sjoin_mob_q3.head(2))
#Save each join as backup
sjoin_fix_q1.to_file("gs://imperial_analytics_project/backup/sjoin_fix_q1/sjoin_fix_q1.shp")
sjoin_fix_q2.to_file("gs://imperial_analytics_project/backup/sjoin_fix_q2/sjoin_fix_q2.shp")
sjoin_fix_q3.to_file("gs://imperial_analytics_project/backup/sjoin_fix_q3/sjoin_fix_q3.shp")
sjoin_mob_q1.to_file("gs://imperial_analytics_project/backup/sjoin_mob_q1/sjoin_mob_q1.shp")
sjoin_mob_q2.to_file("gs://imperial_analytics_project/backup/sjoin_mob_q2/sjoin_mob_q2.shp")
sjoin_mob_q3.to_file("gs://imperial_analytics_project/backup/sjoin_mob_q3/sjoin_mob_q3.shp")
I first replicate the original dataframe distribution by repeating reach test measurement for each quarter dataset. I then perform a groupby using the concatenation of the department code and the municipality code as groupby key.
sjoin_fix_q1['DPTO_MPIO'] = sjoin_fix_q1['DPTO_CCDGO'] + "_" + sjoin_fix_q1['MPIO_CCDGO']
sjoin_fix_q1r = sjoin_fix_q1.sort_values(by = 'quadkey').reset_index(drop=True)
sjoin_fix_q1r = sjoin_fix_q1r.loc[sjoin_fix_q1r.index.repeat(sjoin_fix_q1r.tests)]
# Create unique key for groupby
print("Fixed Q1 sjoin repeat by municipality generated Shape:", sjoin_fix_q1r.shape)
display(sjoin_fix_q1r.head(2))
sjoin_fix_q1r_avg = sjoin_fix_q1r[['DPTO_MPIO', 'avg_d_kbps', 'avg_u_kbps', 'avg_lat_ms']].groupby(['DPTO_MPIO']).mean().rename(columns={'avg_d_kbps': 'avg_d_kbps_q1', 'avg_u_kbps': 'avg_u_kbps_q1', 'avg_lat_ms': 'avg_lat_ms_q1'})
sjoin_fix_q1r_std = sjoin_fix_q1r[['DPTO_MPIO', 'avg_d_kbps', 'avg_u_kbps', 'avg_lat_ms']].groupby(by = ['DPTO_MPIO']).std().rename(columns={'avg_d_kbps': 'std_d_kbps_q1', 'avg_u_kbps': 'std_u_kbps_q1', 'avg_lat_ms': 'std_lat_ms_q1'})
sjoin_fix_q1_sum = sjoin_fix_q1[['DPTO_MPIO','tests', 'devices']].groupby(['DPTO_MPIO']).sum().rename(columns={'tests': 'tests_q1', 'devices': 'devices_q1'})
sjoin_fix_q1r_group = sjoin_fix_q1r_avg.join(sjoin_fix_q1r_std)
sjoin_fix_q1r_group = sjoin_fix_q1r_group.join(sjoin_fix_q1_sum)
print("Fixed Q1 groupby by municipality generated Shape:", sjoin_fix_q1r_group.shape)
display(sjoin_fix_q1r_group.head(2))
sjoin_fix_q2['DPTO_MPIO'] = sjoin_fix_q2['DPTO_CCDGO'] + "_" + sjoin_fix_q2['MPIO_CCDGO']
sjoin_fix_q2r = sjoin_fix_q2.sort_values(by = 'quadkey').reset_index(drop=True)
sjoin_fix_q2r = sjoin_fix_q2r.loc[sjoin_fix_q2r.index.repeat(sjoin_fix_q2r.tests)]
# Create unique key for groupby
print("Fixed Q2 sjoin repeat by municipality generated Shape:", sjoin_fix_q2r.shape)
display(sjoin_fix_q2r.head(2))
sjoin_fix_q2r_avg = sjoin_fix_q2r[['DPTO_MPIO', 'avg_d_kbps', 'avg_u_kbps', 'avg_lat_ms']].groupby(['DPTO_MPIO']).mean().rename(columns={'avg_d_kbps': 'avg_d_kbps_q2', 'avg_u_kbps': 'avg_u_kbps_q2', 'avg_lat_ms': 'avg_lat_ms_q2'})
sjoin_fix_q2r_std = sjoin_fix_q2r[['DPTO_MPIO', 'avg_d_kbps', 'avg_u_kbps', 'avg_lat_ms']].groupby(by = ['DPTO_MPIO']).std().rename(columns={'avg_d_kbps': 'std_d_kbps_q2', 'avg_u_kbps': 'std_u_kbps_q2', 'avg_lat_ms': 'std_lat_ms_q2'})
sjoin_fix_q2_sum = sjoin_fix_q2[['DPTO_MPIO','tests', 'devices']].groupby(['DPTO_MPIO']).sum().rename(columns={'tests': 'tests_q2', 'devices': 'devices_q2'})
sjoin_fix_q2r_group = sjoin_fix_q2r_avg.join(sjoin_fix_q2r_std)
sjoin_fix_q2r_group = sjoin_fix_q2r_group.join(sjoin_fix_q2_sum)
print("Fixed Q2 groupby by municipality generated Shape:", sjoin_fix_q2r_group.shape)
display(sjoin_fix_q2r_group.head(2))
sjoin_fix_q3['DPTO_MPIO'] = sjoin_fix_q3['DPTO_CCDGO'] + "_" + sjoin_fix_q3['MPIO_CCDGO']
sjoin_fix_q3r = sjoin_fix_q3.sort_values(by = 'quadkey').reset_index(drop=True)
sjoin_fix_q3r = sjoin_fix_q3r.loc[sjoin_fix_q3r.index.repeat(sjoin_fix_q3r.tests)]
# Create unique key for groupby
print("Fixed Q3 sjoin repeat by municipality generated Shape:", sjoin_fix_q3r.shape)
display(sjoin_fix_q3r.head(2))
sjoin_fix_q3r_avg = sjoin_fix_q3r[['DPTO_MPIO', 'avg_d_kbps', 'avg_u_kbps', 'avg_lat_ms']].groupby(['DPTO_MPIO']).mean().rename(columns={'avg_d_kbps': 'avg_d_kbps_q3', 'avg_u_kbps': 'avg_u_kbps_q3', 'avg_lat_ms': 'avg_lat_ms_q3'})
sjoin_fix_q3r_std = sjoin_fix_q3r[['DPTO_MPIO', 'avg_d_kbps', 'avg_u_kbps', 'avg_lat_ms']].groupby(by = ['DPTO_MPIO']).std().rename(columns={'avg_d_kbps': 'std_d_kbps_q3', 'avg_u_kbps': 'std_u_kbps_q3', 'avg_lat_ms': 'std_lat_ms_q3'})
sjoin_fix_q3_sum = sjoin_fix_q3[['DPTO_MPIO','tests', 'devices']].groupby(['DPTO_MPIO']).sum().rename(columns={'tests': 'tests_q3', 'devices': 'devices_q3'})
sjoin_fix_q3r_group = sjoin_fix_q3r_avg.join(sjoin_fix_q3r_std)
sjoin_fix_q3r_group = sjoin_fix_q3r_group.join(sjoin_fix_q3_sum)
print("Fixed Q3 groupby by municipality generated Shape:", sjoin_fix_q3r_group.shape)
display(sjoin_fix_q3r_group.head(2))
sjoin_mob_q1['DPTO_MPIO'] = sjoin_mob_q1['DPTO_CCDGO'] + "_" + sjoin_mob_q1['MPIO_CCDGO']
sjoin_mob_q1r = sjoin_mob_q1.sort_values(by = 'quadkey').reset_index(drop=True)
sjoin_mob_q1r = sjoin_mob_q1r.loc[sjoin_mob_q1r.index.repeat(sjoin_mob_q1r.tests)]
# Create unique key for groupby
print("Mobile Q1 sjoin repeat by municipality generated Shape:", sjoin_mob_q1r.shape)
display(sjoin_mob_q1r.head(2))
sjoin_mob_q1r_avg = sjoin_mob_q1r[['DPTO_MPIO', 'avg_d_kbps', 'avg_u_kbps', 'avg_lat_ms']].groupby(['DPTO_MPIO']).mean().rename(columns={'avg_d_kbps': 'avg_d_kbps_q1', 'avg_u_kbps': 'avg_u_kbps_q1', 'avg_lat_ms': 'avg_lat_ms_q1'})
sjoin_mob_q1r_std = sjoin_mob_q1r[['DPTO_MPIO', 'avg_d_kbps', 'avg_u_kbps', 'avg_lat_ms']].groupby(by = ['DPTO_MPIO']).std().rename(columns={'avg_d_kbps': 'std_d_kbps_q1', 'avg_u_kbps': 'std_u_kbps_q1', 'avg_lat_ms': 'std_lat_ms_q1'})
sjoin_mob_q1_sum = sjoin_mob_q1[['DPTO_MPIO','tests', 'devices']].groupby(['DPTO_MPIO']).sum().rename(columns={'tests': 'tests_q1', 'devices': 'devices_q1'})
sjoin_mob_q1r_group = sjoin_mob_q1r_avg.join(sjoin_mob_q1r_std)
sjoin_mob_q1r_group = sjoin_mob_q1r_group.join(sjoin_mob_q1_sum)
print("Mobile Q1 groupby by municipality generated Shape:", sjoin_mob_q1r_group.shape)
display(sjoin_mob_q1r_group.head(2))
#sjoin_mob_q1r.loc[(sjoin_mob_q1r['DPTO_MPIO'] == '05_001')]['tests'].sum()
#sjoin_mob_q1.loc[(sjoin_mob_q1['DPTO_CCDGO'] == '05') & (sjoin_mob_q1['MPIO_CCDGO'] == '001')]
#sjoin_mob_q1.loc[(sjoin_mob_q1['DPTO_CCDGO'] == '05') & (sjoin_mob_q1['MPIO_CCDGO'] == '001')]['devices'].sum()
sjoin_mob_q2['DPTO_MPIO'] = sjoin_mob_q2['DPTO_CCDGO'] + "_" + sjoin_mob_q2['MPIO_CCDGO']
sjoin_mob_q2r = sjoin_mob_q2.sort_values(by = 'quadkey').reset_index(drop=True)
sjoin_mob_q2r = sjoin_mob_q2r.loc[sjoin_mob_q2r.index.repeat(sjoin_mob_q2r.tests)]
# Create unique key for groupby
print("Mobile Q2 sjoin repeat by municipality generated Shape:", sjoin_mob_q2r.shape)
display(sjoin_mob_q2r.head(2))
sjoin_mob_q2r_avg = sjoin_mob_q2r[['DPTO_MPIO', 'avg_d_kbps', 'avg_u_kbps', 'avg_lat_ms']].groupby(['DPTO_MPIO']).mean().rename(columns={'avg_d_kbps': 'avg_d_kbps_q2', 'avg_u_kbps': 'avg_u_kbps_q2', 'avg_lat_ms': 'avg_lat_ms_q2'})
sjoin_mob_q2r_std = sjoin_mob_q2r[['DPTO_MPIO', 'avg_d_kbps', 'avg_u_kbps', 'avg_lat_ms']].groupby(by = ['DPTO_MPIO']).std().rename(columns={'avg_d_kbps': 'std_d_kbps_q2', 'avg_u_kbps': 'std_u_kbps_q2', 'avg_lat_ms': 'std_lat_ms_q2'})
sjoin_mob_q2_sum = sjoin_mob_q2[['DPTO_MPIO','tests', 'devices']].groupby(['DPTO_MPIO']).sum().rename(columns={'tests': 'tests_q2', 'devices': 'devices_q2'})
sjoin_mob_q2r_group = sjoin_mob_q2r_avg.join(sjoin_mob_q2r_std)
sjoin_mob_q2r_group = sjoin_mob_q2r_group.join(sjoin_mob_q2_sum)
print("Mobile Q2 groupby by municipality generated Shape:", sjoin_mob_q2r_group.shape)
display(sjoin_mob_q2r_group.head(2))
sjoin_mob_q3['DPTO_MPIO'] = sjoin_mob_q3['DPTO_CCDGO'] + "_" + sjoin_mob_q3['MPIO_CCDGO']
sjoin_mob_q3r = sjoin_mob_q3.sort_values(by = 'quadkey').reset_index(drop=True)
sjoin_mob_q3r = sjoin_mob_q3r.loc[sjoin_mob_q3r.index.repeat(sjoin_mob_q3r.tests)]
# Create unique key for groupby
print("Mobile Q3 sjoin repeat by municipality generated Shape:", sjoin_mob_q3r.shape)
display(sjoin_mob_q3r.head(2))
sjoin_mob_q3r_avg = sjoin_mob_q3r[['DPTO_MPIO', 'avg_d_kbps', 'avg_u_kbps', 'avg_lat_ms']].groupby(['DPTO_MPIO']).mean().rename(columns={'avg_d_kbps': 'avg_d_kbps_q3', 'avg_u_kbps': 'avg_u_kbps_q3', 'avg_lat_ms': 'avg_lat_ms_q3'})
sjoin_mob_q3r_std = sjoin_mob_q3r[['DPTO_MPIO', 'avg_d_kbps', 'avg_u_kbps', 'avg_lat_ms']].groupby(by = ['DPTO_MPIO']).std().rename(columns={'avg_d_kbps': 'std_d_kbps_q3', 'avg_u_kbps': 'std_u_kbps_q3', 'avg_lat_ms': 'std_lat_ms_q3'})
sjoin_mob_q3_sum = sjoin_mob_q3[['DPTO_MPIO','tests', 'devices']].groupby(['DPTO_MPIO']).sum().rename(columns={'tests': 'tests_q3', 'devices': 'devices_q3'})
sjoin_mob_q3r_group = sjoin_mob_q3r_avg.join(sjoin_mob_q3r_std)
sjoin_mob_q3r_group = sjoin_mob_q3r_group.join(sjoin_mob_q3_sum)
print("Mobile Q3 groupby by municipality generated Shape:", sjoin_mob_q3r_group.shape)
display(sjoin_mob_q3r_group.head(2))
# Fixed
fixed_consolidated = sjoin_fix_q1r_group.join(sjoin_fix_q2r_group, how = "left")
fixed_consolidated = fixed_consolidated.join(sjoin_fix_q3r_group, how = "inner")
print("Fixed consolidated generated Shape:", fixed_consolidated.shape)
display(fixed_consolidated.head())
display(fixed_consolidated.describe())
#Mobile
mobile_consolidated = sjoin_mob_q1r_group.join(sjoin_mob_q2r_group, how = "left")
mobile_consolidated = mobile_consolidated.join(sjoin_mob_q3r_group, how = "inner")
print("Mobile consolidated generated Shape:", mobile_consolidated.shape)
display(mobile_consolidated.head())
display(mobile_consolidated.describe())
#fixed_consolidated.to_csv('gs://imperial_analytics_project/backup/fixed_consolidated/fixed_consolidated.csv')
#sjoin_fix_q1r_group.to_csv('gs://imperial_analytics_project/backup/fixed_consolidated/sjoin_fix_q1r_group.csv')
#sjoin_fix_q2r_group.to_csv('gs://imperial_analytics_project/backup/fixed_consolidated/sjoin_fix_q2r_group.csv')
#sjoin_fix_q3r_group.to_csv('gs://imperial_analytics_project/backup/fixed_consolidated/sjoin_fix_q3r_group.csv')
fixed_consolidated = pd.read_csv('gs://imperial_analytics_project/backup/fixed_consolidated/fixed_consolidated.csv', index_col='DPTO_MPIO')
#sjoin_fix_q1r_group = pd.read_csv('gs://imperial_analytics_project/backup/fixed_consolidated/sjoin_fix_q1r_group.csv', index_col='DPTO_MPIO')
#sjoin_fix_q2r_group = pd.read_csv('gs://imperial_analytics_project/backup/fixed_consolidated/sjoin_fix_q2r_group.csv', index_col='DPTO_MPIO')
#sjoin_fix_q3r_group = pd.read_csv('gs://imperial_analytics_project/backup/fixed_consolidated/sjoin_fix_q3r_group.csv', index_col='DPTO_MPIO')
#print("Fixed Q1 loaded Shape:", sjoin_fix_q1r_group.shape)
#print("Fixed Q2 loaded Shape:", sjoin_fix_q2r_group.shape)
#print("Fixed Q3 loaded Shape:", sjoin_fix_q3r_group.shape)
#display(sjoin_fix_q1r_group.head(2))
print("Fixed consolidated generated Shape:", fixed_consolidated.shape)
display(fixed_consolidated.head(2))
#Backup and load consolidated data
#mobile_consolidated.to_csv('gs://imperial_analytics_project/backup/mobile_consolidated/mobile_consolidated.csv')
#sjoin_mob_q1r_group.to_csv('gs://imperial_analytics_project/backup/mobile_consolidated/sjoin_mob_q1r_group.csv')
#sjoin_mob_q2r_group.to_csv('gs://imperial_analytics_project/backup/mobile_consolidated/sjoin_mob_q2r_group.csv')
#sjoin_mob_q3r_group.to_csv('gs://imperial_analytics_project/backup/mobile_consolidated/sjoin_mob_q3r_group.csv')
mobile_consolidated = pd.read_csv('gs://imperial_analytics_project/backup/mobile_consolidated/mobile_consolidated.csv', index_col='DPTO_MPIO')
#sjoin_mob_q1r_group = pd.read_csv('gs://imperial_analytics_project/backup/mobile_consolidated/sjoin_mob_q1r_group.csv', index_col='DPTO_MPIO')
#sjoin_mob_q2r_group = pd.read_csv('gs://imperial_analytics_project/backup/mobile_consolidated/sjoin_mob_q2r_group.csv', index_col='DPTO_MPIO')
#sjoin_mob_q3r_group = pd.read_csv('gs://imperial_analytics_project/backup/mobile_consolidated/sjoin_mob_q3r_group.csv', index_col='DPTO_MPIO')
#print("Mobile Q1 loaded Shape:", sjoin_mob_q1r_group.shape)
#print("Mobile Q2 loaded Shape:", sjoin_mob_q2r_group.shape)
#print("Mobile Q3 loaded Shape:", sjoin_mob_q3r_group.shape)
#display(sjoin_mob_q1r_group.head(2))
print("Mobile consolidated generated Shape:", mobile_consolidated.shape)
display(mobile_consolidated.head(2))
#Join key for municipalities dataset
mun_shp_join = mun_shp.drop_duplicates()
mun_shp_join['DPTO_MPIO'] = mun_shp_join['DPTO_CCDGO'] + "_" + mun_shp_join['MPIO_CCDGO']
mun_shp_join = mun_shp_join.set_index('DPTO_MPIO')
mun_shp_join['MPIO_CCDGO'] = mun_shp_join['MPIO_CCDGO'].astype('int64')
mun_shp_fix = mun_shp_join[mun_shp_join.index.isin(fixed_consolidated.index)]
fixed_consolidated_geo = mun_shp_fix.join(fixed_consolidated, how = 'inner')
fixed_consolidated_geo.reset_index(inplace = True)
print("Fixed geo & consolidated generated Shape:", fixed_consolidated_geo.shape)
display(fixed_consolidated_geo.head(2))
mun_shp_mob = mun_shp_join[mun_shp_join.index.isin(mobile_consolidated.index)]
mobile_consolidated_geo = mun_shp_mob.join(mobile_consolidated, how = 'inner')
mobile_consolidated_geo.reset_index(inplace = True)
print("Mobile geo & consolidated generated Shape:", mobile_consolidated_geo.shape)
display(mobile_consolidated_geo.head(2))
#Backup fixed and mobile consolidated SHP datasets
#fixed_consolidated_geo.to_file("backup/fixed_consolidated/fixed_consolidated.shp")
#mobile_consolidated_geo.to_file("gs://imperial_analytics_project/backup/mobile_consolidated/mobile_consolidated.shp")
#backup list of municipalities
fixed_consolidated_geo[['DPTO_MPIO', 'MPIO_CNMBR', 'DPTO_CNMBR']].to_csv("gs://imperial_analytics_project/backup/fixed_consolidated/fixed_consolidated_list_mcpio.csv")
mobile_consolidated_geo[['DPTO_MPIO', 'MPIO_CNMBR', 'DPTO_CNMBR']].to_csv("gs://imperial_analytics_project/backup/mobile_consolidated/mobile_consolidated_list_mcpio.csv")
display(fixed_acc_pen.head(1))
display(fixed_acc_type.head(1))
display(mobile_cov_tech.tail(1))
# Prepare data
#Prepare GDP data
gdp_mun['GDP'] = gdp_mun['GDP'].str.replace(',','').astype('float')
display(gdp_mun.head(2))
# Fixed access penetration and population
fixed_acc_pen.rename(columns = {'No. ACCESOS FIJOS A INTERNET': 'NUM_INTERNET_ACCESS', 'POBLACIÓN DANE' : 'POPULATION', 'INDICE':'FIX_INT_PENETRATION'}, inplace = True)
fixed_acc_pen['FIX_INT_PENETRATION'] = fixed_acc_pen['FIX_INT_PENETRATION'].str.replace(',','.').astype('float')
fix_acc_pen_q1 = fixed_acc_pen.loc[((fixed_acc_pen['AÑO'] == 2020) & (fixed_acc_pen['TRIMESTRE'] == 1))][['NUM_INTERNET_ACCESS', 'POPULATION', 'FIX_INT_PENETRATION', 'DPTO_MPIO']].sort_values(by=['DPTO_MPIO'])
fix_acc_pen_q3 = fixed_acc_pen.loc[((fixed_acc_pen['AÑO'] == 2020) & (fixed_acc_pen['TRIMESTRE'] == 2))][['NUM_INTERNET_ACCESS', 'POPULATION', 'FIX_INT_PENETRATION', 'DPTO_MPIO']].sort_values(by=['DPTO_MPIO'])
display(fix_acc_pen_q1.head(2))
display(fix_acc_pen_q3.head(2))
#Fixed access type
fixed_acc_type.rename(columns = {'TECNOLOGIA' : 'TECHNOLOGY', 'VELOCIDAD_BAJADA': 'AVG_OFFER_DL_FIX_ACC', 'VELOCIDAD_SUBIDA' : 'AVG_OFFER_UL_FIX_ACC', 'No DE ACCESOS':'NBR_TECH_ACCESS'} , inplace = True)
#One hot type of technology
onehot_fix_tech = pd.get_dummies(fixed_acc_type[['TECHNOLOGY']], prefix = "", prefix_sep="").mul(fixed_acc_type['NBR_TECH_ACCESS'],0)
onehot_fix_group= pd.DataFrame()
onehot_fix_group['FIX_TECH_FTTH'] = onehot_fix_tech['FIBER TO THE HOME (FTTH)']
onehot_fix_group['FIX_TECH_FTTX'] = onehot_fix_tech['FIBER TO THE X (FTTX)'] + onehot_fix_tech['FIBER TO THE CABINET (FTTC)'] + onehot_fix_tech['FIBER TO THE BUILDING O FIBER'] + onehot_fix_tech['FIBER TO THE PREMISES']+ onehot_fix_tech['FIBER TO THE NODE (FTTN)']
onehot_fix_group['FIX_TECH_CABLE'] = onehot_fix_tech['CABLE'] + onehot_fix_tech['HYBRID FIBER COAXIAL (HFC)']
onehot_fix_group['FIX_TECH_COPPER'] = onehot_fix_tech['XDSL'] + onehot_fix_tech['OTRAS TECNOLOG�AS FIJAS']
onehot_fix_group['FIX_TECH_FWA'] = onehot_fix_tech['OTRAS TECNOLOG�AS INAL�MBRICA'] + onehot_fix_tech['WIFI'] + onehot_fix_tech['SATELITAL'] + onehot_fix_tech['WIMAX'] + onehot_fix_tech['FIBER TO THE ANTENNA (FTTA)']
fixed_acc_type_o = fixed_acc_type.join(onehot_fix_group)
#Onehot provider
onehot_fix_prov = pd.get_dummies(fixed_acc_type[['PROVEEDOR']], prefix = "", prefix_sep="").mul(fixed_acc_type['NBR_TECH_ACCESS'],0)
onehot_fix_prov_group= pd.DataFrame()
onehot_fix_prov_group['CLARO'] = onehot_fix_prov['COMUNICACION CELULAR S A COMCEL S A']
onehot_fix_prov_group['TIGO'] = onehot_fix_prov['UNE EPM TELECOMUNICACIONES S.A.']
onehot_fix_prov_group['MOVISTAR'] = onehot_fix_prov['COLOMBIA TELECOMUNICACIONES S.A. E.S.P.']
onehot_fix_prov_group['ETB'] = onehot_fix_prov['EMPRESA DE TELECOMUNICACIONES DE BOGOTA S.A. ESP']
onehot_fix_prov_group['EMCALI'] = onehot_fix_prov['EMPRESAS MUNICIPALES DE CALI EICE E.S.P']
onehot_fix_prov_group['DIRECTV'] = onehot_fix_prov['DIRECTV COLOMBIA LTDA']
onehot_fix_prov_group['EDATEL'] = onehot_fix_prov['EDATEL S.A.']
onehot_fix_prov_group['HV_MULT'] = onehot_fix_prov['H V TELEVISION S.A.S.']
onehot_fix_prov_group['TELEBMGA'] = onehot_fix_prov['EMPRESA DE TELECOMUNICACIONES DE BUCARAMANGA S.A. E.S.P.']
onehot_fix_prov_group['METROTEL'] = onehot_fix_prov['METROTEL S.A']
onehot_fix_prov_group['HUGHES'] = onehot_fix_prov['HUGHES DE COLOMBIA S.A.S.']
onehot_fix_prov_group['AZTECA'] = onehot_fix_prov['AZTECA COMUNICACIONES COLOMBIA S.A.S']
fixed_acc_type_o = fixed_acc_type_o.join(onehot_fix_prov_group)
fixed_acc_type_o['AVG_OFFER_DL_FIX_ACC'] = np.where(fixed_acc_type_o['AVG_OFFER_DL_FIX_ACC'] >100 , #Identifies the case to apply to
100, #This is the value that is inserted
fixed_acc_type_o['AVG_OFFER_DL_FIX_ACC'])
fixed_acc_type_o['AVG_OFFER_UL_FIX_ACC'] = np.where(fixed_acc_type_o['AVG_OFFER_UL_FIX_ACC'] >100 , #Identifies the case to apply to
100, #This is the value that is inserted
fixed_acc_type_o['AVG_OFFER_UL_FIX_ACC'])
fix_acc_type_q1 = fixed_acc_type_o.loc[((fixed_acc_type_o['AÑO'] == 2020) & (fixed_acc_type_o['TRIMESTRE'] == 1))].drop(['AÑO', 'TRIMESTRE','PROVEEDOR', 'DEPARTAMENTO', 'MUNICIPIO', 'SEGMENTO', 'TECHNOLOGY'], axis= 1)
fix_acc_type_q1_sum = fix_acc_type_q1[['DPTO_MPIO','NBR_TECH_ACCESS', 'CLARO', 'TIGO', 'MOVISTAR', 'ETB', 'EMCALI','DIRECTV','EDATEL','HV_MULT','TELEBMGA', 'METROTEL', 'HUGHES','AZTECA','FIX_TECH_FTTH', 'FIX_TECH_FTTX', 'FIX_TECH_CABLE', 'FIX_TECH_COPPER', 'FIX_TECH_FWA']].groupby(by = ['DPTO_MPIO']).sum()
fix_acc_type_q1_avg = fix_acc_type_q1[['DPTO_MPIO', 'AVG_OFFER_DL_FIX_ACC', 'AVG_OFFER_UL_FIX_ACC']].groupby(by = ['DPTO_MPIO']).mean()
fix_acc_type_q1_group = fix_acc_type_q1_sum.join(fix_acc_type_q1_avg).reset_index()
fix_acc_type_q1_group['%_FTTH'] = round((fix_acc_type_q1_group['FIX_TECH_FTTH'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_FTTX'] = round((fix_acc_type_q1_group['FIX_TECH_FTTX'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_CABLE'] = round((fix_acc_type_q1_group['FIX_TECH_CABLE'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_FWA'] = round((fix_acc_type_q1_group['FIX_TECH_FWA'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_COPPER'] = round((fix_acc_type_q1_group['FIX_TECH_COPPER'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_CLARO'] = round((fix_acc_type_q1_group['CLARO'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_TIGO'] = round((fix_acc_type_q1_group['TIGO'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_MOVISTAR'] = round((fix_acc_type_q1_group['MOVISTAR'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_ETB'] = round((fix_acc_type_q1_group['ETB'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_EMCALI'] = round((fix_acc_type_q1_group['EMCALI'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_DIRECTV'] = round((fix_acc_type_q1_group['DIRECTV'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_EDATEL'] = round((fix_acc_type_q1_group['EDATEL'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_HV_MULT'] = round((fix_acc_type_q1_group['HV_MULT'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_TELEBMGA'] = round((fix_acc_type_q1_group['TELEBMGA'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_METROTEL'] = round((fix_acc_type_q1_group['METROTEL'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_HUGHES'] = round((fix_acc_type_q1_group['HUGHES'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q1_group['%_AZTECA'] = round((fix_acc_type_q1_group['AZTECA'])/fix_acc_type_q1_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3 = fixed_acc_type_o.loc[((fixed_acc_type_o['AÑO'] == 2020) & (fixed_acc_type_o['TRIMESTRE'] == 2))].drop(['AÑO', 'TRIMESTRE','PROVEEDOR', 'DEPARTAMENTO', 'MUNICIPIO', 'SEGMENTO', 'TECHNOLOGY'], axis= 1)
fix_acc_type_q3_sum = fix_acc_type_q3[['DPTO_MPIO','NBR_TECH_ACCESS', 'CLARO', 'TIGO', 'MOVISTAR', 'ETB', 'EMCALI','DIRECTV','EDATEL','HV_MULT','TELEBMGA', 'METROTEL', 'HUGHES','AZTECA','FIX_TECH_FTTH', 'FIX_TECH_FTTX', 'FIX_TECH_CABLE', 'FIX_TECH_COPPER', 'FIX_TECH_FWA']].groupby(by = ['DPTO_MPIO']).sum()
fix_acc_type_q3_avg = fix_acc_type_q3[['DPTO_MPIO', 'AVG_OFFER_DL_FIX_ACC', 'AVG_OFFER_UL_FIX_ACC']].groupby(by = ['DPTO_MPIO']).mean()
fix_acc_type_q3_group = fix_acc_type_q3_sum.join(fix_acc_type_q3_avg).reset_index()
fix_acc_type_q3_group['%_FTTH'] = round((fix_acc_type_q3_group['FIX_TECH_FTTH'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_FTTX'] = round((fix_acc_type_q3_group['FIX_TECH_FTTX'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_CABLE'] = round((fix_acc_type_q3_group['FIX_TECH_CABLE'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_FWA'] = round((fix_acc_type_q3_group['FIX_TECH_FWA'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_COPPER'] = round((fix_acc_type_q3_group['FIX_TECH_COPPER'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_CLARO'] = round((fix_acc_type_q3_group['CLARO'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_TIGO'] = round((fix_acc_type_q3_group['TIGO'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_MOVISTAR'] = round((fix_acc_type_q3_group['MOVISTAR'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_ETB'] = round((fix_acc_type_q3_group['ETB'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_EMCALI'] = round((fix_acc_type_q3_group['EMCALI'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_DIRECTV'] = round((fix_acc_type_q3_group['DIRECTV'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_EDATEL'] = round((fix_acc_type_q3_group['EDATEL'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_HV_MULT'] = round((fix_acc_type_q3_group['HV_MULT'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_TELEBMGA'] = round((fix_acc_type_q3_group['TELEBMGA'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_METROTEL'] = round((fix_acc_type_q3_group['METROTEL'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_HUGHES'] = round((fix_acc_type_q3_group['HUGHES'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
fix_acc_type_q3_group['%_AZTECA'] = round((fix_acc_type_q3_group['AZTECA'])/fix_acc_type_q3_group['NBR_TECH_ACCESS'],2)
display(fix_acc_type_q1_group.head(2))
display(fix_acc_type_q3_group.head(2))
#Mobile Coverage Type
mobile_cov_tech_o = mobile_cov_tech[['AÑO','TRIMESTRE','COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE','DPTO_MPIO']].replace({'S': 1, 'N': 0})
onehot_mov_prov = pd.get_dummies(mobile_cov_tech[['PROVEEDOR']], prefix = "", prefix_sep="")
onehot_mov_prov.rename(columns={"AVANTEL S.A.S" : 'AVANTEL', "COLOMBIA MOVIL S.A ESP" : 'TIGO', "COLOMBIA TELECOMUNICACIONES S.A. E.S.P.": "MOVISTAR", "COMUNICACION CELULAR S A COMCEL S A" : "CLARO", "EMPRESA DE TELECOMUNICACIONES DE BOGOTA S.A. ESP": "ETB"}, inplace = True)
mobile_cov_tech_o = mobile_cov_tech_o.join(onehot_mov_prov)
mobile_cov_tech_o['AVANTEL'] = mobile_cov_tech_o['AVANTEL'] * mobile_cov_tech_o[['COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE']].max(axis = 1)
mobile_cov_tech_o['TIGO'] = mobile_cov_tech_o['TIGO'] * mobile_cov_tech_o[['COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE']].max(axis = 1)
mobile_cov_tech_o['MOVISTAR'] = mobile_cov_tech_o['MOVISTAR'] * mobile_cov_tech_o[['COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE']].max(axis = 1)
mobile_cov_tech_o['CLARO'] = mobile_cov_tech_o['CLARO'] * mobile_cov_tech_o[['COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE']].max(axis = 1)
mobile_cov_tech_o['ETB'] = mobile_cov_tech_o['ETB'] * mobile_cov_tech_o[['COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE']].max(axis = 1)
mobile_cov_tech_q1 = mobile_cov_tech_o.loc[((mobile_cov_tech_o['AÑO'] == 2020) & (mobile_cov_tech_o['TRIMESTRE'] == 1))]
mobile_cov_tech_q1_group = mobile_cov_tech_q1[['AVANTEL','TIGO','MOVISTAR','CLARO', 'ETB','COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE','DPTO_MPIO']].groupby(by = ['DPTO_MPIO']).sum()
mobile_cov_tech_q1_group['%_LTE'] = round(mobile_cov_tech_q1_group['COBERTUTA LTE']/(mobile_cov_tech_q1_group[['COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE']].max(axis = 1)),2)
mobile_cov_tech_q1_group['%_AVANTEL'] = round(mobile_cov_tech_q1_group['AVANTEL']/(mobile_cov_tech_q1_group[['COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE']].max(axis = 1)),2)
mobile_cov_tech_q1_group['%_TIGO'] = round(mobile_cov_tech_q1_group['TIGO']/(mobile_cov_tech_q1_group[['COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE']].max(axis = 1)),2)
mobile_cov_tech_q1_group['%_MOVISTAR'] = round(mobile_cov_tech_q1_group['MOVISTAR']/(mobile_cov_tech_q1_group[['COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE']].max(axis = 1)),2)
mobile_cov_tech_q1_group['%_CLARO'] = round(mobile_cov_tech_q1_group['CLARO']/(mobile_cov_tech_q1_group[['COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE']].max(axis = 1)),2)
mobile_cov_tech_q1_group['%_ETB'] = round(mobile_cov_tech_q1_group['ETB']/(mobile_cov_tech_q1_group[['COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE']].max(axis = 1)),2)
mobile_cov_tech_q1_group = mobile_cov_tech_q1_group.reset_index()
mobile_cov_tech_q3 = mobile_cov_tech_o.loc[((mobile_cov_tech_o['AÑO'] == 2020) & (mobile_cov_tech_o['TRIMESTRE'] == 2))]
mobile_cov_tech_q3_group = mobile_cov_tech_q3[['COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE','DPTO_MPIO']].groupby(by = ['DPTO_MPIO']).sum()
mobile_cov_tech_q3_group['%_LTE'] = round(mobile_cov_tech_q3_group['COBERTUTA LTE']/(mobile_cov_tech_q3_group[['COBERTURA 2G','COBERTURA 3G','COBERTURA HSPA+, HSPA+DC','COBERTUTA LTE']].max(axis = 1)),2)
mobile_cov_tech_q3_group = mobile_cov_tech_q3_group.reset_index()
display(mobile_cov_tech_q1_group.head(2))
display(mobile_cov_tech_q3_group.head(2))
#Check total category sum in fixed type dataset
print(onehot_fix_group['FIX_TECH_FTTH'].sum() + onehot_fix_group['FIX_TECH_FTTX'].sum() + onehot_fix_group['FIX_TECH_CABLE'].sum() + onehot_fix_group['FIX_TECH_COPPER'].sum() + onehot_fix_group['FIX_TECH_FWA'].sum())
# Copy fixed consolidated
fixed_analysis = fixed_consolidated_geo.copy()
# Join fixed penetration and population
fixed_facc_pen_join_q1 = fix_acc_pen_q1[fix_acc_pen_q1['DPTO_MPIO'].isin(fixed_consolidated_geo['DPTO_MPIO'])]
fixed_analysis = fixed_analysis.merge(fixed_facc_pen_join_q1, how = 'left', on = 'DPTO_MPIO', suffixes = (None, "_Q1"))
fixed_facc_pen_join_q3 = fix_acc_pen_q3[fix_acc_pen_q3['DPTO_MPIO'].isin(fixed_consolidated_geo['DPTO_MPIO'])]
fixed_analysis = fixed_analysis.merge(fixed_facc_pen_join_q3, how = 'left', on = 'DPTO_MPIO', suffixes = (None, "_Q3"))
# Join fixed access type
fixed_facc_type_join_q1 = fix_acc_type_q1_group[fix_acc_type_q1_group['DPTO_MPIO'].isin(fixed_consolidated_geo['DPTO_MPIO'])]
fixed_analysis = fixed_analysis.merge(fixed_facc_type_join_q1, how = 'left', on = 'DPTO_MPIO', suffixes = (None, "_Q1"))
fixed_facc_type_join_q3 = fix_acc_type_q3_group[fix_acc_type_q3_group['DPTO_MPIO'].isin(fixed_consolidated_geo['DPTO_MPIO'])]
fixed_analysis = fixed_analysis.merge(fixed_facc_type_join_q3, how = 'left', on = 'DPTO_MPIO', suffixes = (None, "_Q3"))
fixed_analysis = fixed_analysis.merge(gdp_mun[['DPTO_MPIO','GDP']], how = 'left', on = 'DPTO_MPIO')
fixed_analysis.drop(['DPTO_CCDGO', 'MPIO_CRSLC', 'MPIO_CCNCT','MPIO_NANO', 'SHAPE_AREA','SHAPE_LEN','ORIG_FID'], axis= 1, inplace = True)
print(fixed_consolidated_geo.shape)
print("Fixed consolidated with additional information generated Shape:", fixed_analysis.shape)
display(fixed_analysis.sort_values(by=['%_FTTH'], ascending = False).head(2))
display(fixed_analysis.loc[(fixed_analysis.MPIO_CCDGO == 1)].sort_values(by=['POPULATION'], ascending = False).head(2))
# Copy mobile consolidated
mobile_analysis = mobile_consolidated_geo.copy()
# Join fixed penetration and population
mobile_facc_pen_join_q1 = fix_acc_pen_q1[fix_acc_pen_q1['DPTO_MPIO'].isin(mobile_consolidated_geo['DPTO_MPIO'])]
mobile_analysis = mobile_analysis.merge(mobile_facc_pen_join_q1, how = 'left', on = 'DPTO_MPIO', suffixes = (None, "_Q1"))
mobile_facc_pen_join_q3 = fix_acc_pen_q3[fix_acc_pen_q3['DPTO_MPIO'].isin(mobile_consolidated_geo['DPTO_MPIO'])]
mobile_analysis = mobile_analysis.merge(mobile_facc_pen_join_q3, how = 'left', on = 'DPTO_MPIO', suffixes = (None, "_Q3"))
# Join fixed access type
mobile_facc_type_join_q1 = fix_acc_type_q1_group[fix_acc_type_q1_group['DPTO_MPIO'].isin(mobile_consolidated_geo['DPTO_MPIO'])]
mobile_analysis = mobile_analysis.merge(mobile_facc_type_join_q1, how = 'left', on = 'DPTO_MPIO', suffixes = (None, "_Q1"))
mobile_facc_type_join_q3 = fix_acc_type_q3_group[fix_acc_type_q3_group['DPTO_MPIO'].isin(mobile_consolidated_geo['DPTO_MPIO'])]
mobile_analysis = mobile_analysis.merge(mobile_facc_type_join_q3, how = 'left', on = 'DPTO_MPIO', suffixes = (None, "_Q3"))
#Join mobile coverage type
mobile_cov_type_join_q1 = mobile_cov_tech_q1_group[mobile_cov_tech_q1_group['DPTO_MPIO'].isin(mobile_consolidated_geo['DPTO_MPIO'])]
mobile_analysis = mobile_analysis.merge(mobile_cov_type_join_q1, how = 'left', on = 'DPTO_MPIO', suffixes = (None, "_Q1"))
mobile_cov_type_join_q3 = mobile_cov_tech_q1_group[mobile_cov_tech_q3_group['DPTO_MPIO'].isin(mobile_consolidated_geo['DPTO_MPIO'])]
mobile_analysis = mobile_analysis.merge(mobile_cov_type_join_q3, how = 'left', on = 'DPTO_MPIO', suffixes = (None, "_Q3"))
mobile_analysis = mobile_analysis.merge(gdp_mun[['DPTO_MPIO','GDP']], how = 'left', on = 'DPTO_MPIO')
mobile_analysis.drop(['DPTO_CCDGO', 'MPIO_CRSLC', 'MPIO_CCNCT','MPIO_NANO', 'SHAPE_AREA','SHAPE_LEN','ORIG_FID'], axis= 1, inplace = True)
print("Mobile consolidated with additional information generated Shape:", mobile_analysis.shape)
display(mobile_analysis.sort_values(by=['%_LTE'], ascending = False).head(2))
display(mobile_analysis.loc[(mobile_analysis.MPIO_CCDGO == 1)].sort_values(by=['POPULATION'], ascending = False).head(2))
#list_mcpio_mobile = mobile_consolidated_geo[['DPTO_MPIO', 'DPTO_CCDGO', 'MPIO_CCDGO', 'MPIO_CNMBR', 'DPTO_CNMBR']]
#list_mcpio_mobile.to_csv('gs://imperial_analytics_project/backup/mobile_consolidated/list_cities.csv')
# Fixed analysis
fixed_analysis['dif_avg_d_kbps_q3_q1'] = fixed_analysis['avg_d_kbps_q3']-fixed_analysis['avg_d_kbps_q1']
fixed_analysis['%_dif_avg_d_q3_q1'] = fixed_analysis['dif_avg_d_kbps_q3_q1'] / fixed_analysis['avg_d_kbps_q1']
fixed_analysis['dif_avg_u_kbps_q3_q1'] = fixed_analysis['avg_u_kbps_q3']-fixed_analysis['avg_u_kbps_q1']
fixed_analysis['%_dif_avg_u_q3_q1'] = fixed_analysis['dif_avg_u_kbps_q3_q1'] / fixed_analysis['avg_u_kbps_q1']
fixed_analysis['dif_avg_lat_ms_q3_q1'] = fixed_analysis['avg_lat_ms_q3']-fixed_analysis['avg_lat_ms_q1']
fixed_analysis['%_dif_avg_lat_q3_q1'] = fixed_analysis['dif_avg_lat_ms_q3_q1'] / fixed_analysis['avg_lat_ms_q1']
fixed_analysis['%_devices_q3_q1'] = fixed_analysis['devices_q3']/fixed_analysis['devices_q1']
fixed_analysis['POP_DENSITY'] = fixed_analysis['POPULATION_Q3']/fixed_analysis['MPIO_NAREA']
fixed_analysis['GDP_CAPITA'] = fixed_analysis['GDP']/fixed_analysis['POPULATION_Q3']
print("Fixed calculated with difference in KPIs Shape:", fixed_analysis.shape)
display(fixed_analysis.loc[(fixed_analysis['devices_q1'] > 10) & (fixed_analysis['devices_q3'] > 10)].sort_values(by=['%_dif_avg_d_q3_q1'], ascending = True).head(2))
# Mobile analysis
mobile_analysis['dif_avg_d_kbps_q3_q1'] = mobile_analysis['avg_d_kbps_q3']-mobile_analysis['avg_d_kbps_q1']
mobile_analysis['%_dif_avg_d_q3_q1'] = mobile_analysis['dif_avg_d_kbps_q3_q1'] / mobile_analysis['avg_d_kbps_q1']
mobile_analysis['dif_avg_u_kbps_q3_q1'] = mobile_analysis['avg_u_kbps_q3']-mobile_analysis['avg_u_kbps_q1']
mobile_analysis['%_dif_avg_u_q3_q1'] = mobile_analysis['dif_avg_u_kbps_q3_q1'] / mobile_analysis['avg_u_kbps_q1']
mobile_analysis['dif_avg_lat_ms_q3_q1'] = mobile_analysis['avg_lat_ms_q3']-mobile_analysis['avg_lat_ms_q1']
mobile_analysis['%_dif_avg_lat_q3_q1'] = mobile_analysis['dif_avg_lat_ms_q3_q1'] / mobile_analysis['avg_lat_ms_q1']
mobile_analysis['%_devices_q3_q1'] = mobile_analysis['devices_q3']/mobile_analysis['devices_q1']
mobile_analysis['POP_DENSITY'] = mobile_analysis['POPULATION_Q3']/mobile_analysis['MPIO_NAREA']
mobile_analysis['GDP_CAPITA'] = mobile_analysis['GDP']/mobile_analysis['POPULATION_Q3']
print("Mobile calculated with difference in KPIs Shape:", mobile_analysis.shape)
display(mobile_analysis.loc[(mobile_analysis['devices_q1'] > 10) & (mobile_analysis['devices_q3'] > 10)].sort_values(by=['%_dif_avg_d_q3_q1'], ascending = True).head(2))
fixed_analysis_no_geo = fixed_analysis.drop(['geometry'], axis = 1)
fixed_analysis_no_geo = fixed_analysis_no_geo.drop_duplicates()
print("Fixed without geometry created Shape:", fixed_analysis_no_geo.shape)
print(fixed_facc_pen_join_q1.shape)
mobile_analysis_no_geo = mobile_analysis.drop(['geometry'], axis = 1)
mobile_analysis_no_geo = mobile_analysis_no_geo.drop_duplicates()
print("Mobile without geometry created Shape:", mobile_analysis_no_geo.shape)
print(mobile_facc_pen_join_q1.shape)
agg_traf_day = agg_traf_op.copy()
agg_traf_base_day = agg_traf_base.copy()
agg_traf_day.rename(columns={" Fecha del día de tráfico" : 'DATE', "Tráfico Datos Total Día (GB)" : 'DAILY_TRAFFIC'}, inplace = True)
agg_traf_day.drop(['NIT Proveedor', 'Hora Pico', 'Tráfico Datos Internacional (GB)', 'Tráfico Datos NAPs - Colombia (GB)', 'Tráfico Datos Acuerdos de tránsito o peering directo (GB)', 'Tráfico Datos Local (GB)'] , axis = 1, inplace = True)
agg_traf_day['DATE'] = pd.to_datetime(agg_traf_day['DATE'], format = '%d/%m/%Y')
agg_traf_day.dtypes
agg_traf_day_group = agg_traf_day[['DATE','DAILY_TRAFFIC']].groupby(by = ['DATE']).sum().reset_index()
agg_traf_day_group['MONTH'] = pd.DatetimeIndex(agg_traf_day_group['DATE']).month
agg_traf_day_group['MONTHLY_AVG'] = agg_traf_day_group['DAILY_TRAFFIC']
agg_traf_day['TYPE'] = "daily traffic"
r = range(3, 12)
for i in r:
agg_traf_day_group['MONTHLY_AVG'] = np.where((agg_traf_day_group['MONTH'] == i), agg_traf_day_group.loc[agg_traf_day_group['MONTH'] == i]['DAILY_TRAFFIC'].mean(), agg_traf_day_group['MONTHLY_AVG'])
agg_traf_day_group = agg_traf_day_group.drop(['MONTH'], axis = 1)
agg_traf_day_group['TYPE'] = "daily traffic"
agg_traf_base_day['AVG_BASE_TRAFFIC'] = agg_traf_base_day['AVG_BASE_TRAFFIC'].str.replace(',','').astype('float')
agg_traf_base_day['DATE'] = pd.to_datetime(agg_traf_base_day['DATE'], format = '%d/%m/%Y')
agg_traf_base_day = agg_traf_base_day.drop(['NIT Proveedor'], axis = 1).rename(columns={"AVG_BASE_TRAFFIC" : 'DAILY_TRAFFIC'})
agg_traf_base_day['TYPE'] = "monthly average base"
agg_traf_base_group = agg_traf_base_day[['DATE','DAILY_TRAFFIC']].groupby(by = ['DATE']).sum().reset_index()
agg_traf_base_group['TYPE'] = "monthly average base"
#Concat DataFrames
agg_traf_total_concat = pd.concat([agg_traf_day, agg_traf_base_day])
agg_traf_group_concat = pd.concat([agg_traf_day_group, agg_traf_base_group])
display(agg_traf_group_concat.tail(2))
print(agg_traf_group_concat.shape)
display(agg_traf_total_concat.head(2))
print(agg_traf_total_concat.shape)
The data preparation process has resulted in three datasets that I will be used for the analysis
fig_dims = (20, 14)
fig, ax = plt.subplots(2,1, figsize=fig_dims)
ax[0].set(ylim=(3e7, 10e7))
sns.lineplot(data = agg_traf_group_concat, x = 'DATE', y = 'DAILY_TRAFFIC', style = 'TYPE', ax=ax[0], legend = "full")
sns.lineplot(data = agg_traf_day_group, x = 'DATE', y = 'MONTHLY_AVG', ax=ax[0], legend = "full", label = "traffic monthly average", color = "red")
g = sns.lineplot(data = agg_traf_total_concat, x = 'DATE', y = 'DAILY_TRAFFIC', style = "TYPE", hue = 'Proveedor', ax=ax[1], legend = "brief")
g.legend(ncol = 5, bbox_to_anchor=(0.9, -0.08))
Remarks: The graphs show an evident increase in the amount of traffic starting in march, jumping from a daily average of 52 to 57 PB in January and February, to a daily average of 72 to 90 PB in March to December. Colombia instaurated the lockdown in March 22nd and started relaxing lockdown restrictions in may. The general quarantine was terminated in August. Charts show that the traffic has not decreased after the economy re-opened. On the contrary, the traffic seems to continue growing. This might be explained by the fact that schools are still closed and many companies continue to work remotely. However, this might also prove that society has implemented a digital transformation that might stay in place even after the pandemic emergency is over. In that case network infrastructure will to adapt for this new reality.
fig_dims = (20, 20)
fig, ax = plt.subplots(2,2, figsize=fig_dims)
df_plot1 = fixed_consolidated_geo.copy()
df_plot2 = mobile_consolidated_geo.copy()
variable1 = 'avg_d_kbps_q1'
variable2 = 'avg_d_kbps_q3'
scheme_1 = mpclas.Quantiles(df_plot1[variable1], k=4)
scheme_2 = mpclas.Quantiles(df_plot2[variable1], k=4)
gpl.choropleth(df_plot1, cmap="Blues", hue=df_plot1[variable1], scheme=scheme_1, legend = True, ax = ax[0,0])
gpl.choropleth(df_plot1, cmap="Blues", hue=df_plot1[variable2], scheme=scheme_1, legend = True, ax = ax[0,1])
gpl.choropleth(df_plot2, cmap="Blues", hue=df_plot2[variable1], scheme=scheme_2, legend = True, ax = ax[1,0])
gpl.choropleth(df_plot2, cmap="Blues", hue=df_plot2[variable2], scheme=scheme_2, legend = True, ax = ax[1,1])
fig.suptitle("Average DL Speed (kbps) per Type of Connectivity and Quarter", fontsize=16)
ax[0,0].set_title("Average Fixed DL Speed Q1 (kbps)")
ax[0,1].set_title("Average Fixed DL Speed Q3 (kbps)")
ax[1,0].set_title("Average Mobile DL Speed Q1 (kbps)")
ax[1,1].set_title("Average Mobile DL Speed Q3 (kbps)")
fig.tight_layout()
fig.subplots_adjust(top=0.94)
fig_dims = (20, 20)
fig, ax = plt.subplots(2,2, figsize=fig_dims)
df_plot1 = fixed_consolidated_geo.copy()
df_plot2 = mobile_consolidated_geo.copy()
variable1 = 'avg_u_kbps_q1'
variable2 = 'avg_u_kbps_q3'
scheme_1 = mpclas.Quantiles(df_plot1[variable1], k=4)
scheme_2 = mpclas.Quantiles(df_plot2[variable1], k=4)
gpl.choropleth(df_plot1, cmap="Blues", hue=df_plot1[variable1], scheme=scheme_1, legend = True, ax = ax[0,0])
gpl.choropleth(df_plot1, cmap="Blues", hue=df_plot1[variable2], scheme=scheme_1, legend = True, ax = ax[0,1])
gpl.choropleth(df_plot2, cmap="Blues", hue=df_plot2[variable1], scheme=scheme_2, legend = True, ax = ax[1,0])
gpl.choropleth(df_plot2, cmap="Blues", hue=df_plot2[variable2], scheme=scheme_2, legend = True, ax = ax[1,1])
fig.suptitle("Average UL Speed (kbps) per Type of Connectivity and Quarter", fontsize=16)
ax[0,0].set_title("Average Fixed UL Speed Q1 (kbps)")
ax[0,1].set_title("Average Fixed UL Speed Q3 (kbps)")
ax[1,0].set_title("Average Mobile UL Speed Q1 (kbps)")
ax[1,1].set_title("Average Mobile UL Speed Q3 (kbps)")
fig.tight_layout()
fig.subplots_adjust(top=0.94)
fig_dims = (20, 20)
fig, ax = plt.subplots(2,2, figsize=fig_dims)
df_plot1 = fixed_consolidated_geo.copy()
df_plot2 = mobile_consolidated_geo.copy()
variable1 = 'avg_lat_ms_q1'
variable2 = 'avg_lat_ms_q3'
scheme_1 = mpclas.Quantiles(df_plot1[variable1], k=4)
scheme_2 = mpclas.Quantiles(df_plot2[variable1], k=4)
gpl.choropleth(df_plot1, cmap="Blues", hue=df_plot1[variable1], scheme=scheme_1, legend = True, ax = ax[0,0])
gpl.choropleth(df_plot1, cmap="Blues", hue=df_plot1[variable2], scheme=scheme_1, legend = True, ax = ax[0,1])
gpl.choropleth(df_plot2, cmap="Blues", hue=df_plot2[variable1], scheme=scheme_2, legend = True, ax = ax[1,0])
gpl.choropleth(df_plot2, cmap="Blues", hue=df_plot2[variable2], scheme=scheme_2, legend = True, ax = ax[1,1])
fig.suptitle("Average Latency (ms) per Type of Connectivity and Quarter", fontsize=16)
ax[0,0].set_title("Average Fixed Latency Q1 (ms)")
ax[0,1].set_title("Average Fixed Latency Q3 (ms)")
ax[1,0].set_title("Average Mobile Latency Q1 (ms)")
ax[1,1].set_title("Average Mobile Latency Q3 (ms)")
fig.tight_layout()
fig.subplots_adjust(top=0.94)
# Understand kpi distribution using describe function
display(fixed_analysis_no_geo[['devices_q1', 'devices_q2', 'devices_q3', 'avg_d_kbps_q1', 'avg_d_kbps_q2','avg_d_kbps_q3', 'avg_u_kbps_q1', 'avg_u_kbps_q2' , 'avg_u_kbps_q3', 'avg_lat_ms_q1', 'avg_lat_ms_q2', 'avg_lat_ms_q3']].describe())
display(mobile_analysis_no_geo[['devices_q1', 'devices_q2', 'devices_q3', 'avg_d_kbps_q1', 'avg_d_kbps_q2','avg_d_kbps_q3', 'avg_u_kbps_q1', 'avg_u_kbps_q2' , 'avg_u_kbps_q3', 'avg_lat_ms_q1', 'avg_lat_ms_q2', 'avg_lat_ms_q3']].describe())
fix_un_pivot_kpi = fixed_analysis_no_geo[['avg_d_kbps_q1','avg_d_kbps_q2','avg_d_kbps_q3']].melt(var_name='Quarter_DL' , value_name = 'Avg_DL_kbps')
fix_un_pivot_kpi = fix_un_pivot_kpi.join(fixed_analysis_no_geo[['avg_u_kbps_q1','avg_u_kbps_q2','avg_u_kbps_q3']].melt(var_name='Quarter_UL' , value_name = 'Avg_UL_kbps'))
fix_un_pivot_kpi = fix_un_pivot_kpi.join(fixed_analysis_no_geo[['avg_lat_ms_q1','avg_lat_ms_q2','avg_lat_ms_q3']].melt(var_name='Quarter_Lat' , value_name = 'Avg_Lat_ms'))
fix_un_pivot_kpi = fix_un_pivot_kpi.join(fixed_analysis_no_geo[['devices_q1','devices_q2','devices_q3']].melt(var_name='Quarter_dev' , value_name = 'Number_unique_devices'))
fix_un_pivot_kpi['Quarter'] = fix_un_pivot_kpi['Quarter_DL'].str[-2:]
fix_un_pivot_kpi = fix_un_pivot_kpi.drop(['Quarter_DL', 'Quarter_UL', 'Quarter_Lat', 'Quarter_dev'], axis = 1)
fig, ax1 = plt.subplots(1,2, figsize=(23,6))
sns.lineplot(data=fix_un_pivot_kpi, x="Quarter", y="Avg_DL_kbps", label= "Avg_Downlink_Speed", ax = ax1[0], err_style = "bars" )
sns.lineplot(data=fix_un_pivot_kpi, x="Quarter", y="Avg_UL_kbps", label= "Avg_Upwnlink_Speed", ax = ax1[0], err_style = "bars")
sns.lineplot(data=fix_un_pivot_kpi, x="Quarter", y="Avg_Lat_ms", ax = ax1[1], err_style = "bars")
ax1[0].legend(loc='lower right', fontsize = 12)
ax1[0].set_title("Fixed Average DL/UL Speed KPIs Evolution per Quarter")
ax1[0].set_ylabel('Average Speed (kbps)')
ax1[1].set_title("Fixed Average Latency KPI Evolution per Quarter")
mob_un_pivot_kpi = mobile_analysis_no_geo[['avg_d_kbps_q1','avg_d_kbps_q2','avg_d_kbps_q3']].melt(var_name='Quarter_DL' , value_name = 'Avg_DL_kbps')
mob_un_pivot_kpi = mob_un_pivot_kpi.join(mobile_analysis_no_geo[['avg_u_kbps_q1','avg_u_kbps_q2','avg_u_kbps_q3']].melt(var_name='Quarter_UL' , value_name = 'Avg_UL_kbps'))
mob_un_pivot_kpi = mob_un_pivot_kpi.join(mobile_analysis_no_geo[['avg_lat_ms_q1','avg_lat_ms_q2','avg_lat_ms_q3']].melt(var_name='Quarter_Lat' , value_name = 'Avg_Lat_ms'))
mob_un_pivot_kpi = mob_un_pivot_kpi.join(mobile_analysis_no_geo[['devices_q1','devices_q2','devices_q3']].melt(var_name='Quarter_dev' , value_name = 'Number_unique_devices'))
mob_un_pivot_kpi['Quarter'] = mob_un_pivot_kpi['Quarter_DL'].str[-2:]
mob_un_pivot_kpi = mob_un_pivot_kpi.drop(['Quarter_DL', 'Quarter_UL', 'Quarter_Lat', 'Quarter_dev'], axis = 1)
fig, ax1 = plt.subplots(1,2, figsize=(23,6))
sns.lineplot(data=mob_un_pivot_kpi, x="Quarter", y="Avg_DL_kbps", label= "Avg_Downlink_Speed", ax = ax1[0], err_style = "bars" )
sns.lineplot(data=mob_un_pivot_kpi, x="Quarter", y="Avg_UL_kbps", label= "Avg_Upwnlink_Speed", ax = ax1[0], err_style = "bars")
sns.lineplot(data=mob_un_pivot_kpi, x="Quarter", y="Avg_Lat_ms", ax = ax1[1], err_style = "bars")
ax1[0].legend(loc='center right', fontsize = 12)
ax1[0].set_title("Mobile Average DL/UL Speed KPIs Evolution per Quarter")
ax1[0].set_ylabel('Average Speed (kbps)')
ax1[1].set_title("Mobile Average Latency KPI Evolution per Quarter")
# Plot Average DL Distribution
fig, ax1 = plt.subplots(1,2 , figsize=(23,6))
variable1 = 'avg_d_kbps_q1'
variable2 = 'avg_d_kbps_q3'
offset = 0
bins = np.linspace(0, 50000, 100)
xx = np.linspace(0, 50000, 1000)
fxq1 = fixed_analysis_no_geo[variable1]
fkdeq1 = stats.gaussian_kde(fxq1)
fxq3 = fixed_analysis_no_geo[variable2]
fkdeq3 = stats.gaussian_kde(fxq3)
mxq1 = mobile_analysis_no_geo[variable1]
mkdeq1 = stats.gaussian_kde(mxq1)
mxq3 = mobile_analysis_no_geo[variable2]
mkdeq3 = stats.gaussian_kde(mxq3)
ax2 = np.array([a.twinx() for a in ax1.ravel()])
ax1[0].hist(fxq1, bins= bins - offset, alpha=0.2, label='Average DL Speed Q1')
ax1[0].hist(fxq3, bins= bins- offset, alpha=0.2, label='Average DL Speed Q3')
ax2[0].plot(xx- offset, fkdeq1(xx), label = "Density_Function_Q1")
ax2[0].plot(xx - offset, fkdeq3(xx), label = "Density_Function_Q3")
ax1[0].set_xlabel('Average DL Speed (kbps)')
ax1[0].set_ylabel('Amount of Municipalities')
ax2[0].set_ylabel('KDE Distribution')
ax1[0].legend(loc='upper right')
ax2[0].legend(bbox_to_anchor=(1, 0.85))
ax2[0].set_ylim(0,0.000115)
ax1[0].set_title("Average Fixed DL Speed Distribution")
ax1[1].hist(mxq1, bins= bins - offset, alpha=0.2, label='Average DL Speed Q1')
ax1[1].hist(mxq3, bins= bins- offset, alpha=0.2, label='Average DL Speed Q3')
ax2[1].plot(xx - offset, mkdeq1(xx), label = "Density_Function_Q1")
ax2[1].plot(xx - offset, mkdeq3(xx), label = "Density_Function_Q3")
ax1[1].set_xlabel('Average DL Speed (kbps)')
ax1[1].set_ylabel('Amount of Municipalities')
ax2[1].set_ylabel('KDE Distribution')
ax1[1].legend(loc='upper right')
ax2[1].legend(bbox_to_anchor=(1, 0.85))
ax2[1].set_ylim(0,4.3e-5)
ax1[1].set_title("Average Mobile DL Speed Distribution")
fig.suptitle("Average DL Speed Distribution per Type of Connectivity and Quarter", fontsize=16)
plt.show()
# Plot Average UL Distribution
fig, ax1 = plt.subplots(1,2 , figsize=(23,6))
variable1 = 'avg_u_kbps_q1'
variable2 = 'avg_u_kbps_q3'
offset = 0
bins = np.linspace(0, 40000, 100)
xx = np.linspace(0, 40000, 1000)
fxq1 = fixed_analysis_no_geo[variable1]
fkdeq1 = stats.gaussian_kde(fxq1)
fxq3 = fixed_analysis_no_geo[variable2]
fkdeq3 = stats.gaussian_kde(fxq3)
mxq1 = mobile_analysis_no_geo[variable1]
mkdeq1 = stats.gaussian_kde(mxq1)
mxq3 = mobile_analysis_no_geo[variable2]
mkdeq3 = stats.gaussian_kde(mxq3)
ax2 = np.array([a.twinx() for a in ax1.ravel()])
ax1[0].hist(fxq1, bins= bins - offset, alpha=0.2, label='Average UL Speed Q1')
ax1[0].hist(fxq3, bins= bins- offset, alpha=0.2, label='Average UL Speed Q3')
ax2[0].plot(xx- offset, fkdeq1(xx), label = "Density_Function_Q1")
ax2[0].plot(xx - offset, fkdeq3(xx), label = "Density_Function_Q3")
ax1[0].set_xlabel('Average UL Speed (kbps)')
ax1[0].set_ylabel('Amount of Municipalities')
ax2[0].set_ylabel('KDE Distribution')
ax1[0].legend(loc='upper right')
ax2[0].legend(bbox_to_anchor=(1, 0.85))
ax2[0].set_ylim(0, 1.55e-4)
ax1[0].set_title("Average Fixed UL Speed Distribution")
ax1[1].hist(mxq1, bins= bins - offset, alpha=0.2, label='Average UL Speed Q1')
ax1[1].hist(mxq3, bins= bins- offset, alpha=0.2, label='Average UL Speed Q3')
ax2[1].plot(xx- offset, mkdeq1(xx), label = "Density_Function_Q1")
ax2[1].plot(xx - offset, mkdeq3(xx), label = "Density_Function_Q3")
ax1[1].set_xlabel('Average UL Speed (kbps)')
ax1[1].set_ylabel('Amount of Municipalities')
ax2[1].set_ylabel('KDE Distribution')
ax1[1].legend(loc='upper right')
ax2[1].legend(bbox_to_anchor=(1, 0.85))
ax2[1].set_ylim(0,0.9e-4)
ax1[1].set_title("Average Mobile UL Speed Distribution")
fig.suptitle("Average UL Speed Distribution per Type of Connectivity and Quarter", fontsize=16)
plt.show()
# Plot Average UL Distribution
fig, ax1 = plt.subplots(1,2 , figsize=(23,6))
variable1 = 'avg_lat_ms_q1'
variable2 = 'avg_lat_ms_q3'
offset = 0
bins = np.linspace(0, 400, 100)
xx = np.linspace(0, 400, 1000)
fxq1 = fixed_analysis_no_geo[variable1]
fkdeq1 = stats.gaussian_kde(fxq1)
fxq3 = fixed_analysis_no_geo[variable2]
fkdeq3 = stats.gaussian_kde(fxq3)
mxq1 = mobile_analysis_no_geo[variable1]
mkdeq1 = stats.gaussian_kde(mxq1)
mxq3 = mobile_analysis_no_geo[variable2]
mkdeq3 = stats.gaussian_kde(mxq3)
ax2 = np.array([a.twinx() for a in ax1.ravel()])
ax1[0].hist(fxq1, bins= bins - offset, alpha=0.2, label='Average Latency Q1')
ax1[0].hist(fxq3, bins= bins- offset, alpha=0.2, label='Average Latency Q3')
ax2[0].plot(xx- offset, fkdeq1(xx), label = "Density_Function_Q1")
ax2[0].plot(xx - offset, fkdeq3(xx), label = "Density_Function_Q3")
ax1[0].set_xlabel('Average Latency (ms)')
ax1[0].set_ylabel('Amount of Municipalities')
ax2[0].set_ylabel('KDE Distribution')
ax1[0].legend(loc='upper right')
ax2[0].legend(bbox_to_anchor=(1, 0.85))
ax2[0].set_ylim(0, 0.0095)
ax1[0].set_title("Average Fixed Latency Distribution")
ax1[1].hist(mxq1, bins= bins - offset, alpha=0.2, label='Average Latency Q1')
ax1[1].hist(mxq3, bins= bins- offset, alpha=0.2, label='Average Latency Q3')
ax2[1].plot(xx- offset, mkdeq1(xx), label = "Density_Function_Q1")
ax2[1].plot(xx - offset, mkdeq3(xx), label = "Density_Function_Q3")
ax1[1].set_xlabel('Average Latency (ms)')
ax1[1].set_ylabel('Amount of Municipalities')
ax2[1].set_ylabel('KDE Distribution')
ax1[1].legend(loc='upper right')
ax2[1].legend(bbox_to_anchor=(1, 0.85))
ax2[1].set_ylim(0,0.029)
ax1[1].set_title("Average Mobile Latency Distribution")
fig.suptitle("Average Latency Distribution per Type of Connectivity and Quarter", fontsize=16)
plt.show()
# Plot Average DL Distribution difference
fig, ax1 = plt.subplots(1,2 , figsize=(23,6))
variable1 = 'dif_avg_d_kbps_q3_q1'
offset = 1
bins = np.linspace(-20000, 20000, 200)
xx = np.linspace(-20000, 20000, 1000)
fx = fixed_analysis_no_geo[variable1]
fkde = stats.gaussian_kde(fx)
mx = mobile_analysis_no_geo[variable1]
mkde = stats.gaussian_kde(mx)
ax2 = np.array([a.twinx() for a in ax1.ravel()])
ax1[0].hist(fx, bins= bins - offset, alpha=0.2, label='Average DL Speed Difference (Q3-Q1)')
ax2[0].plot(xx- offset, fkde(xx), label = "Density_Function")
ax1[0].set_xlabel('Average DL Speed (kbps)')
ax1[0].set_ylabel('Amount of Municipalities')
ax2[0].set_ylabel('KDE Distribution')
ax1[0].legend(loc='upper right')
#ax2[0].legend(bbox_to_anchor=(1, 0.85))
#ax2[0].set_ylim(0, 0.007)
ax1[0].set_title("Average Fixed DL Speed Difference (Q3-Q1) Distribution")
ax1[1].hist(mx, bins= bins - offset, alpha=0.2, label='Average DL Speed Difference (Q3-Q1)')
ax2[1].plot(xx- offset, mkde(xx), label = "Density_Function")
ax1[1].set_xlabel('Average DL Speed (kbps)')
ax1[1].set_ylabel('Amount of Municipalities')
ax2[1].set_ylabel('KDE Distribution')
ax1[1].legend(loc='upper right')
#ax2[1].legend(bbox_to_anchor=(1, 0.85))
#ax2[1].set_ylim(0,0.007)
ax1[1].set_title("Average Mobile DL Speed Difference (Q3-Q1) Distribution")
fig.suptitle("Average DL Speed Difference (Q3-Q1) Distribution per Type of Connectivity", fontsize=16)
plt.show()
# Plot Average UL Distribution difference
fig, ax1 = plt.subplots(1,2 , figsize=(23,6))
variable1 = 'dif_avg_u_kbps_q3_q1'
offset = 1
bins = np.linspace(-20000, 20000, 200)
xx = np.linspace(-20000, 20000, 1000)
fx = fixed_analysis_no_geo[variable1]
fkde = stats.gaussian_kde(fx)
mx = mobile_analysis_no_geo[variable1]
mkde = stats.gaussian_kde(mx)
ax2 = np.array([a.twinx() for a in ax1.ravel()])
ax1[0].hist(fx, bins= bins - offset, alpha=0.2, label='Average UL Speed Difference (Q3-Q1)')
ax2[0].plot(xx- offset, fkde(xx), label = "Density_Function")
ax1[0].set_xlabel('Average UL Speed (kbps)')
ax1[0].set_ylabel('Amount of Municipalities')
ax2[0].set_ylabel('KDE Distribution')
ax1[0].legend(loc='upper right')
#ax2[0].legend(bbox_to_anchor=(1, 0.85))
#ax2[0].set_ylim(0, 0.007)
ax1[0].set_title("Average Fixed UL Speed Difference (Q3-Q1) Distribution")
ax1[1].hist(mx, bins= bins - offset, alpha=0.2, label='Average UL Speed Difference (Q3-Q1)')
ax2[1].plot(xx- offset, mkde(xx), label = "Density_Function")
ax1[1].set_xlabel('Average UL Speed (kbps)')
ax1[1].set_ylabel('Amount of Municipalities')
ax2[1].set_ylabel('KDE Distribution')
ax1[1].legend(loc='upper right')
#ax2[1].legend(bbox_to_anchor=(1, 0.85))
#ax2[1].set_ylim(0,0.007)
ax1[1].set_title("Average Mobile UL Speed Difference (Q3-Q1) Distribution")
fig.suptitle("Average UL Speed Difference (Q3-Q1) Distribution per Type of Connectivity", fontsize=16)
plt.show()
# Plot Average Latency Distribution difference
fig, ax1 = plt.subplots(1,2 , figsize=(23,6))
variable1 = 'dif_avg_lat_ms_q3_q1'
offset = 1
bins = np.linspace(-100, 100, 200)
xx = np.linspace(-100, 100, 1000)
fx = fixed_analysis_no_geo[variable1]
fkde = stats.gaussian_kde(fx)
mx = mobile_analysis_no_geo[variable1]
mkde = stats.gaussian_kde(mx)
ax2 = np.array([a.twinx() for a in ax1.ravel()])
ax1[0].hist(fx, bins= bins - offset, alpha=0.2, label='Average Latency Difference (Q3-Q1)')
ax2[0].plot(xx- offset, fkde(xx), label = "Density_Function")
ax1[0].set_xlabel('Average Latency (ms)')
ax1[0].set_ylabel('Amount of Municipalities')
ax2[0].set_ylabel('KDE Distribution')
ax1[0].legend(loc='upper right')
#ax2[0].legend(bbox_to_anchor=(1, 0.85))
#ax2[0].set_ylim(0, 0.007)
ax1[0].set_title("Average Fixed Latency Difference (Q3-Q1) Distribution")
ax1[1].hist(mx, bins= bins - offset, alpha=0.2, label='Average UL Speed Difference (Q3-Q1)')
ax2[1].plot(xx- offset, mkde(xx), label = "Density_Function")
ax1[1].set_xlabel('Average Latency (ms)')
ax1[1].set_ylabel('Amount of Municipalities')
ax2[1].set_ylabel('KDE Distribution')
ax1[1].legend(loc='upper right')
#ax2[1].legend(bbox_to_anchor=(1, 0.85))
#ax2[1].set_ylim(0,0.007)
ax1[1].set_title("Average Mobile Latency Difference (Q3-Q1) Distribution")
fig.suptitle("Average Latency Difference (Q3-Q1) Distribution per Type of Connectivity", fontsize=16)
plt.show()
The distribution plots didn't show an evident deterioration on the mean and the overall distribution when considering all the municipalities. However, distribution of the KPI difference between Q3 and Q1 charts also show that there is a deterioration for certain group of municipalities, specially for Uplink Speed and Latency KPIs. This section plots the distribution of the % deterioration for each KPI.
Although the datasets include information for 967 municipalities for fixed networks and 948 municipalities for mobile networks, many of the municipalities just included a couple of tests and 1 or 2 unique devices. A minimum amount of devices and tests per municipality shall be defined prior to any further analysis. The following graphs show the distribution for both the number of devices per municipality, taking the minimum of unique devices between the datasets from Q1 and Q3.
The graph shows that 227 municipalities had 5 or less unique devices running speed tests in Q1 or Q3 for fixed networks, and 447 municipalities had 5 or less devices running speed tests in Q1 or Q3 for mobile networks. Although 5 unique devices is a sample size that is far from ideal, increasing the threshold would result in a loss of more than 50% of the mobile base of municipalities, therefore the analysis will be made with a threshold of minimum 5 unique devices per municipality for both Q1 and Q3. Any municipality with less unique devices was discarted.
fig, ax1 = plt.subplots(1,2 , figsize=(23,6))
bins = np.linspace(0, 80, 100)
fx = fixed_analysis_no_geo[['devices_q1','devices_q3']].min(axis = 1)
mx = mobile_analysis_no_geo[['devices_q1','devices_q3']].min(axis = 1)
ax2 = np.array([a.twinx() for a in ax1.ravel()])
ax1[0].hist(fx, bins= bins, alpha=0.2, label='Min of Amount of Devices between Q1 and Q3')
ax2[0].hist(fx, bins= bins, histtype='step', cumulative=1)
ax1[0].set_xlabel('Amount of Devices that made a SpeedTest')
ax2[0].set_ylabel('Amount of Municipalities - Cummulative Distribution')
ax1[0].set_ylabel('Amount of Municipalities - Histogram')
ax1[0].legend(loc='upper left')
ax1[0].set_title("Fixed Min of Amount of Unique Devices with a Test per Municipality between Q1 and Q3", fontsize = 14)
ax1[0].set_xticks(np.arange(0, 80, step=5))
ax1[0].grid( linestyle='--')
ax1[1].hist(mx, bins= bins, alpha=0.2, label='Min of Amount of Devices between Q1 and Q3')
ax2[1].hist(mx, bins= bins, histtype='step', cumulative=1)
ax1[1].set_xlabel('Amount of Devices that made a SpeedTest')
ax2[1].set_ylabel('Amount of Municipalities - Cummulative Distribution')
ax1[1].set_ylabel('Amount of Municipalities - Histogram')
ax1[1].legend(loc='upper left')
ax1[1].set_title("Mobile Min of Amount of Devices with a Test per Municipality between Q1 and Q3", fontsize = 14)
ax1[1].set_xticks(np.arange(0, 80, step=5))
ax1[1].grid( linestyle='--')
# Discard municipalities with less than 5 unique devices in either Q1 or Q3
thr_dev_fix = 15
thr_dev_mob = 10
fixed_analysis_no_geo_dev = fixed_analysis_no_geo.loc[(fixed_analysis_no_geo['devices_q1'] > thr_dev_fix) & (fixed_analysis_no_geo['devices_q3'] > thr_dev_fix)]
mobile_analysis_no_geo_dev = mobile_analysis_no_geo.loc[(mobile_analysis_no_geo['devices_q1'] > thr_dev_mob) & (mobile_analysis_no_geo['devices_q3'] > thr_dev_mob)]
print("The shape of the fixed base excluding municipalities with less than", thr_dev_fix, "devices per quarter is: ", fixed_analysis_no_geo_dev.shape)
#display(fixed_analysis_no_geo_dev.head())
print("The shape of the mobile base excluding municipalities with less than", thr_dev_mob, "devices per quarter is: ", mobile_analysis_no_geo_dev.shape)
#display(mobile_analysis_no_geo_dev.head())
Just as predicted by my hypothesis, even though the KPI distribution of all the municipalities showed no significant differences between Q1 and Q3, there is a significant group of municipalities that is showing severe affectation:
I defined 30% as the threshold for significant deterioration in the KPIs for the following steps of the analysis
# Deterioration for fixed network base
deterioration_fixed = pd.DataFrame()
deterioration_fixed['Deterioriation_Threshold'] = np.arange(0.1, 0.85, step=0.05)
deterioration_fixed['Municipalities with DL deterioration'] = np.nan
deterioration_fixed['Municipalities with UL deterioration'] = np.nan
deterioration_fixed['Municipalities with Latency deterioration'] = np.nan
deterioration_fixed['Municipalities with deterioration in any KPI beyond threshold'] = np.nan
for index, row in deterioration_fixed.iterrows():
row['Municipalities with DL deterioration'] = fixed_analysis_no_geo_dev.loc[(fixed_analysis_no_geo_dev['%_dif_avg_d_q3_q1'] < -row['Deterioriation_Threshold'])].shape[0]
row['Municipalities with UL deterioration'] = fixed_analysis_no_geo_dev.loc[(fixed_analysis_no_geo_dev['%_dif_avg_u_q3_q1'] < -row['Deterioriation_Threshold'])].shape[0]
row['Municipalities with Latency deterioration'] = fixed_analysis_no_geo_dev.loc[(fixed_analysis_no_geo_dev['%_dif_avg_lat_q3_q1'] > row['Deterioriation_Threshold'])].shape[0]
row['Municipalities with deterioration in any KPI beyond threshold'] = fixed_analysis_no_geo_dev.loc[(fixed_analysis_no_geo_dev['%_dif_avg_d_q3_q1'] < -row['Deterioriation_Threshold'])|(fixed_analysis_no_geo_dev['%_dif_avg_u_q3_q1'] < -row['Deterioriation_Threshold'])|(fixed_analysis_no_geo_dev['%_dif_avg_lat_q3_q1'] > row['Deterioriation_Threshold'])].shape[0]
deterioration_fixed.head()
# Deterioration for mobile network base
deterioration_mobile = pd.DataFrame()
deterioration_mobile['Deterioriation_Threshold'] = np.arange(0.10, 0.85, step=0.05)
deterioration_mobile['Municipalities with DL deterioration'] = np.nan
deterioration_mobile['Municipalities with UL deterioration'] = np.nan
deterioration_mobile['Municipalities with Latency deterioration'] = np.nan
deterioration_mobile['Municipalities with deterioration in any KPI beyond threshold'] = np.nan
for index, row in deterioration_mobile.iterrows():
row['Municipalities with DL deterioration'] = mobile_analysis_no_geo_dev.loc[(mobile_analysis_no_geo_dev['%_dif_avg_d_q3_q1'] < -row['Deterioriation_Threshold'])].shape[0]
row['Municipalities with UL deterioration'] = mobile_analysis_no_geo_dev.loc[(mobile_analysis_no_geo_dev['%_dif_avg_u_q3_q1'] < -row['Deterioriation_Threshold'])].shape[0]
row['Municipalities with Latency deterioration'] = mobile_analysis_no_geo_dev.loc[(mobile_analysis_no_geo_dev['%_dif_avg_lat_q3_q1'] > row['Deterioriation_Threshold'])].shape[0]
row['Municipalities with deterioration in any KPI beyond threshold'] = mobile_analysis_no_geo_dev.loc[(mobile_analysis_no_geo_dev['%_dif_avg_d_q3_q1'] < -row['Deterioriation_Threshold'])|(mobile_analysis_no_geo_dev['%_dif_avg_u_q3_q1'] < -row['Deterioriation_Threshold'])|(mobile_analysis_no_geo_dev['%_dif_avg_lat_q3_q1'] > row['Deterioriation_Threshold'])].shape[0]
deterioration_mobile.head()
fig, ax1 = plt.subplots(1,2 , figsize=(23,6))
ax1[0].plot(deterioration_fixed['Deterioriation_Threshold'],deterioration_fixed['Municipalities with DL deterioration'], label= 'Number of municipalities with DL deterioration')
ax1[0].plot(deterioration_fixed['Deterioriation_Threshold'],deterioration_fixed['Municipalities with UL deterioration'], label= 'Number of municipalities with UL deterioration')
ax1[0].plot(deterioration_fixed['Deterioriation_Threshold'],deterioration_fixed['Municipalities with Latency deterioration'], label= 'Number of municipalities with Latency deterioration')
ax1[0].plot(deterioration_fixed['Deterioriation_Threshold'],deterioration_fixed['Municipalities with deterioration in any KPI beyond threshold'], label= 'Number of municipalities with deterioration in any KPI beyond threshold')
ax1[0].set_xticks(np.arange(0.1, 0.85, step=0.05))
ax1[0].set_yticks(np.arange(0, 450, step=50))
ax1[0].set_ylim(0, 450)
ax1[0].legend(loc='upper right')
ax1[0].grid()
ax1[0].set_xlabel('KPI deterioration (1- [KPI Q3 / KPI Q1])')
ax1[0].set_ylabel('Amount of Municipalities')
ax1[0].set_title("Fixed Network - Number of Municipalities vs KPI Deterioration", fontsize = 14)
ax1[1].plot(deterioration_mobile['Deterioriation_Threshold'],deterioration_mobile['Municipalities with DL deterioration'], label= 'Number of municipalities with DL deterioration')
ax1[1].plot(deterioration_mobile['Deterioriation_Threshold'],deterioration_mobile['Municipalities with UL deterioration'], label= 'Number of municipalities with UL deterioration')
ax1[1].plot(deterioration_mobile['Deterioriation_Threshold'],deterioration_mobile['Municipalities with Latency deterioration'], label= 'Number of municipalities with Latency deterioration')
ax1[1].plot(deterioration_mobile['Deterioriation_Threshold'],deterioration_mobile['Municipalities with deterioration in any KPI beyond threshold'], label= 'Number of municipalities with deterioration in any KPI beyond threshold')
ax1[1].set_xticks(np.arange(0.1 , 0.85, step=0.05))
ax1[1].set_yticks(np.arange(0, 450, step=50))
ax1[1].set_ylim(0, 450)
ax1[1].legend(loc='upper right')
ax1[1].grid()
ax1[1].set_xlabel('KPI deterioration (1- [KPI Q3 / KPI Q1])')
ax1[1].set_ylabel('Amount of Municipalities')
ax1[1].set_title("Mobile Network - Number of Municipalities vs KPI Deterioration", fontsize = 14)
kpi_thr = 0.30
fixed_analysis_no_geo_dev['Significant_deterioration'] = np.where((fixed_analysis_no_geo_dev['%_dif_avg_d_q3_q1'] < -kpi_thr)|(fixed_analysis_no_geo_dev['%_dif_avg_u_q3_q1'] < -kpi_thr)|(fixed_analysis_no_geo_dev['%_dif_avg_lat_q3_q1'] > kpi_thr), #Identifies the case to apply to
1, #This is the value that is inserted
0)
mobile_analysis_no_geo_dev['Significant_deterioration'] = np.where((mobile_analysis_no_geo_dev['%_dif_avg_d_q3_q1'] < -kpi_thr)|(mobile_analysis_no_geo_dev['%_dif_avg_u_q3_q1'] < -kpi_thr)|(mobile_analysis_no_geo_dev['%_dif_avg_lat_q3_q1'] > kpi_thr), #Identifies the case to apply to
1, #This is the value that is inserted
0)
display(mobile_analysis_no_geo_dev.head())
display(fixed_analysis_no_geo_dev.head())
features = ['MPIO_CCDGO', 'POPULATION_Q3', 'POP_DENSITY', 'GDP_CAPITA', 'FIX_INT_PENETRATION_Q3', '%_FTTH_Q3', '%_FTTX_Q3', '%_CABLE_Q3', '%_FWA_Q3', '%_COPPER_Q3', 'avg_d_kbps_q1', 'avg_u_kbps_q1', 'avg_lat_ms_q1', '%_dif_avg_d_q3_q1', '%_dif_avg_u_q3_q1', '%_dif_avg_lat_q3_q1', 'Significant_deterioration']
print(fixed_analysis_no_geo_dev[features].dtypes)
print(fixed_analysis_no_geo_dev['Significant_deterioration'].value_counts())
fixed_analysis_no_geo_dev[features].pivot_table(index = 'Significant_deterioration', values=features, aggfunc='mean')
plt.figure(figsize=(15,15))
sns.heatmap(fixed_analysis_no_geo_dev[features].corr(),annot=True)
plt.plot()
features = ['MPIO_CCDGO', 'POPULATION_Q3', 'POP_DENSITY', 'GDP_CAPITA', 'FIX_INT_PENETRATION_Q3', '%_FTTH_Q3', '%_FTTX_Q3', '%_CABLE_Q3', '%_FWA_Q3', '%_COPPER_Q3', 'avg_d_kbps_q1', 'avg_u_kbps_q1', 'avg_lat_ms_q1', 'Significant_deterioration']
sns.pairplot(fixed_analysis_no_geo_dev[features], hue = 'Significant_deterioration')
features = ['MPIO_CCDGO', 'POPULATION_Q3', 'POP_DENSITY', 'GDP_CAPITA', 'FIX_INT_PENETRATION_Q3', '%_FTTH_Q3', '%_FTTX_Q3', '%_CABLE_Q3', '%_FWA_Q3', '%_COPPER_Q3', 'avg_d_kbps_q1', 'avg_u_kbps_q1', 'avg_lat_ms_q1', '%_dif_avg_d_q3_q1', '%_dif_avg_u_q3_q1', '%_dif_avg_lat_q3_q1', '%_LTE_Q3', 'Significant_deterioration']
print(mobile_analysis_no_geo_dev[features].dtypes)
print(mobile_analysis_no_geo_dev['Significant_deterioration'].value_counts())
mobile_analysis_no_geo_dev[features].pivot_table(index = 'Significant_deterioration', values=features, aggfunc='mean')
mobile_analysis_no_geo_dev[features].pivot_table(index = 'Significant_deterioration', values=features, aggfunc='mean')
plt.figure(figsize=(15,15))
sns.heatmap(mobile_analysis_no_geo_dev[features].corr(),annot=True)
plt.plot()
features = ['MPIO_CCDGO', 'POPULATION_Q3', 'POP_DENSITY', 'GDP_CAPITA', 'FIX_INT_PENETRATION_Q3', '%_FTTH_Q3', '%_FTTX_Q3', '%_CABLE_Q3', '%_FWA_Q3', '%_COPPER_Q3', 'avg_d_kbps_q1', 'avg_u_kbps_q1', 'avg_lat_ms_q1', '%_LTE_Q3', 'Significant_deterioration']
sns.pairplot(mobile_analysis_no_geo_dev[features], hue = 'Significant_deterioration')
fixed_analysis_no_geo_dev.to_csv("gs://imperial_analytics_project/backup/fixed_analysis_no_geo_dev.csv")
mobile_analysis_no_geo_dev.to_csv("gs://imperial_analytics_project/backup/mobile_analysis_no_geo_dev.csv")